Distance-based adaptive k-neighborhood selection
نویسندگان
چکیده
The k-nearest neighbor classifier follows a simple, yet powerful algorithm: collect the k data points closest to an unlabeled instance, according to a given distance measure, and use them to predict that instance’s label. The two components, the parameter k governing the size of used neighborhood, and the distance measure, essentially determine success or failure of the classifier. In this work, we propose to reverse the use of outlier-detection techniques that are based on k-neighborhoods in order to determine the value of k. To achieve this, we invert the workings of these techniques: instead of using a fixed k to decide whether an instance is an outlier, we stop growing the k-neighborhood as soon as the unlabeled instance would be given outlier status. We derive a number of criteria from different neighborhood-based outlier detection techniques. With the exception of one technique, our approaches have low complexity and running times. In our experiments, we compare against two recently proposed techniques from the field that are have more sophisticated theoretical foundations, as well as against two well-established kNN classifiers. We find that our approaches are competitive with existing work and especially that the recent techniques do not constitute an improvement. CR Subject Classification : I.2, H.2.8 Distance-based adaptive k-neighborhood selection Albrecht Zimmermann [email protected] KU Leuven, Celestijnenlaan 200A, Leuven, B-3001 Belgium Abstract. The k-nearest neighbor classifier follows a simple, yet powerful algorithm: collect the k data points closest to an unlabeled instance, according to a given distance measure, and use them to predict that instance’s label. The two components, the parameter k governing the size of used neighborhood, and the distance measure, essentially determine success or failure of the classifier. In this work, we propose to reverse the use of outlier-detection techniques that are based on k-neighborhoods in order to determine the value of k. To achieve this, we invert the workings of these techniques: instead of using a fixed k to decide whether an instance is an outlier, we stop growing the k-neighborhood as soon as the unlabeled instance would be given outlier status. We derive a number of criteria from different neighborhood-based outlier detection techniques. With the exception of one technique, our approaches have low complexity and running times. In our experiments, we compare against two recently proposed techniques from the field that are have more sophisticated theoretical foundations, as well as against two well-established kNN classifiers. We find that our approaches are competitive with existing work and especially that the recent techniques do not constitute an improvement. The k-nearest neighbor classifier follows a simple, yet powerful algorithm: collect the k data points closest to an unlabeled instance, according to a given distance measure, and use them to predict that instance’s label. The two components, the parameter k governing the size of used neighborhood, and the distance measure, essentially determine success or failure of the classifier. In this work, we propose to reverse the use of outlier-detection techniques that are based on k-neighborhoods in order to determine the value of k. To achieve this, we invert the workings of these techniques: instead of using a fixed k to decide whether an instance is an outlier, we stop growing the k-neighborhood as soon as the unlabeled instance would be given outlier status. We derive a number of criteria from different neighborhood-based outlier detection techniques. With the exception of one technique, our approaches have low complexity and running times. In our experiments, we compare against two recently proposed techniques from the field that are have more sophisticated theoretical foundations, as well as against two well-established kNN classifiers. We find that our approaches are competitive with existing work and especially that the recent techniques do not constitute an improvement.
منابع مشابه
Geometry-Aware Neighborhood Search for Learning Local Models for Image Reconstruction
Local learning of sparse image models has proven to be very effective to solve inverse problems in many computer vision applications. To learn such models, the data samples are often clustered using the K-means algorithm with the Euclidean distance as a dissimilarity metric. However, the Euclidean distance may not always be a good dissimilarity measure for comparing data samples lying on a mani...
متن کاملThe Time Adaptive Self Organizing Map for Distribution Estimation
The feature map represented by the set of weight vectors of the basic SOM (Self-Organizing Map) provides a good approximation to the input space from which the sample vectors come. But the timedecreasing learning rate and neighborhood function of the basic SOM algorithm reduce its capability to adapt weights for a varied environment. In dealing with non-stationary input distributions and changi...
متن کاملParameterless Isomap with Adaptive Neighborhood Selection
Isomap is a highly popular manifold learning and dimensionality reduction technique that effectively performs multidimensional scaling on estimates of geodesic distances. However, the resulting output is extremely sensitive to parameters that control the selection of neighbors at each point. To date, no principled way of setting these parameters has been proposed, and in practice they are often...
متن کاملAn Adaptive LEACH-based Clustering Algorithm for Wireless Sensor Networks
LEACH is the most popular clastering algorithm in Wireless Sensor Networks (WSNs). However, it has two main drawbacks, including random selection of cluster heads, and direct communication of cluster heads with the sink. This paper aims to introduce a new centralized cluster-based routing protocol named LEACH-AEC (LEACH with Adaptive Energy Consumption), which guarantees to generate balanced cl...
متن کاملA Robust Competitive Global Supply Chain Network Design under Disruption: The Case of Medical Device Industry
In this study, an optimization model is proposed to design a Global Supply Chain (GSC) for a medical device manufacturer under disruption in the presence of pre-existing competitors and price inelasticity of demand. Therefore, static competition between the distributors’ facilities to more efficiently gain a further share in market of Economic Cooperation Organization trade agreement (ECOTA) is...
متن کامل